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Abstract 

Practical face recognition has been studied in the past 
decades, but still remains an open challenge. Current 
prevailing approaches have already achieved substan¬ 
tial breakthroughs in recognition accuracy. However, 
their performance usually drops dramatically if face 
samples are severely misaligned. To address this prob¬ 
lem, we propose a highly efficient misalignment-robust 
locality-constrained representation (MRLR) algorithm 
for practical real-time face recognition. Specifically, 
the locality constraint that activates the most correlated 
atoms and suppresses the uncorrelated ones, is applied 
to construct the dictionary for face alignment. Then 
we simultaneously align the warped face and update 
the locality-constrained dictionary, eventually obtain¬ 
ing the final alignment. Moreover, we make use of the 
block structure to accelerate the derived analytical so¬ 
lution. Experimental results on public data sets show 
that MRLR significantly outperforms several state-of- 
the-art approaches in terms of efficiency and scalability 
with even better performance. 


Introduction 

Over the past years, face recognition has been and is still 
one of the most important and fundamental computer vi¬ 
sion problem. Significant progresses have been made in 
face recognition, ranging from the family of sparse repre¬ 
sentation ( Wright et al. 2009[ [Wagner et al. 2012[ [Zhang, | 
Yan g, and Feng 2011| ) to the application of deep convolu- 
tional neural network (CNN ) { Sun, Wang, and Tang 2014a[ 
Sun et al. 2014[ [Taigman et al. 2014| ). While achieving im¬ 
pressive recognition accuracy in controlled environments 
(some of them even surpass the human performance at 
certain tasks), most of them also show strong robustness 
to occlusion and illumination. However, these algorithms 
largely depend on well-aligned training and testing sam¬ 
ples. Research ( [Shan et al. 2004| ) has demonstrated that even 
slight misalignment can globally transform the entire im¬ 
ages, greatly reducing the recognition accuracy. Even the 
CNN that achieves the state-of-the-art performance nowa¬ 
days needs to align the training and testing faces to the same 
position, since misaligned query faces can greatly degrade 
its performance ( [Schroff, Kalenichenko, and Philbin 20l5 ). 


* indicates equal contributions. 
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Figure 1. An Illustration of the MRLR. The red bounding box de¬ 
notes the input estimate and the green one is the output estimate. 
The whole algorithm works in an iterating style. 

Thus current face recognition techniques can benefit from 
an efficient and well-performing face alignment algorithm. 

In this paper, we only consider the face alignment meth¬ 
ods based on subspace learning and_sparse representation 
( Huang, Huang, and Metaxas 2008; Yang, Zhang, and Zhang 
2012[|Wagner et al. 2012[ ). Although there are other types 


of face image registration methods that can handle larger 
face variation in expression and pose, e.g. active appearance 
models ( Cootes, Edwards, and Taylor 2001[ ), active shape 
models ( Cootes et al. 1995]T~a nd unsupervised joint align¬ 
ment (Huang, Jain, and Learned-Miller 2007]), their com¬ 
plexity is usually too high for efficient alignment while ours 
is far more efficient and suitable for real-time situations. Be¬ 
sides, the facial landmarks based methods only focus on ac¬ 
curately detect the facial key points, while ours aligns the 
face based on the whole training samples and focus on ben¬ 
efiting the subsequent recognition. 


Related Work 

( [Wright et al. 2009| reported the sparse representation based 
classification (SRC), which seeks to represent an aligned 
testing image by the linear sparse combination of training 
images. The basic assumption for SRC is that all the train¬ 
ing and testing samples need to be well aligned, so SRC 
performs poorly with misaligned faces. To overcome such 
shortcoming, ( [Huang, Huang, and Metaxas 2008| proposed 
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Figure 2. An Illustration of the MRLR iteration procedure. The left image is the input uncropped face. We use the Viola-Jones detector to gen¬ 
erate an initialized estimate, and obtain the locality-constrained dictionary (LCD) for it, while the LCD is to compute the new transformation. 
Then we iteratively update the face transformation and the corresponding LCD until convergence. The bar plot denotes the label distribution 
of the LCD. It shows that the LCD contains increasingly more training samples from the same class as the testing face after each iteration. 


the transform-invariant sparse representation (TSR). They 
add deformations in training set, simultaneously recover¬ 
ing the image transformation and representation coefficients. 
However, TSR aligns testing image to global dictionary and 
thus easily gets trapped in local minima. To avoid that, ro¬ 
bust alignment by sparse representation (RASR) ( [Wagner el 
al. 2012 ) aligns the testing image to training samples of each 
subject, then warps training set and testing image to a uni¬ 
fied transformation for recognition. The exhaustive subject 
by subject search effectively finds the global optima, but it 
is extremely time-consuming, especially when the subject 
number is large. Therefore RASR detrimental to efficiency 
and scalability. ( [Yang, Zhang, and Zhang 2012| ) proposed 
the efficient misalignment-robust representation (MRR) for 
face recognition. With the carefully controlled training set, 
they perform the singular value decomposition (SVD) and 
use principal components to approximate the global dictio¬ 
nary, significantly enhancing its real-time ability. However, 
SVD operation therein is still time and memory consuming, 
preventing MRR from being applied in large-scale datasets. 
Using the principle components of dictionary instead of the 
original one inevitably reduces alignment accuracy. 

Motivations and Contributions 

In summary, current prevailing sparse representation based 
face alignment methods contain several major shortcomings: 

• Time-consuming: ( [Wagner et al. 20l2{ |Zhuang et al. 
|2013| ) need to align the face in an exhaustive manner using 
sub-dictionaries that are constructed by every individual. 
Suppose the dataset contains more than a thousand indi¬ 
viduals, these algorithms will work extremely slow. 

• Easy to introduce ba ckground noise: ( [Wagner et al. 2012] 
|Zhuang et al. 2013|) align the training set to the t esting 
sample (e.g. ( [Wagner et al. 2012| |Zhuang et al. 2013) ), 
which may introduce background noise if the testing sam¬ 
ple is largely off-centered and break the low-dimensional 
linear illumination model ( [Basri and Jacobs 2003| ). 

• Unsatisfactory sub space: ([Huang, H uang, and Metaxas| 
|2008[|Yang, Zhang, and Zhang 2012| ) use the global dic¬ 
tionary to perform the alignment. The global dictionary 


contains various uncorrelated face samples and produces 
a unsatisfactory subspace for alignment. 

• Unable to benefit from outside data. 

In order to address the above problems, we propose 
a misalignment-robust locality-constrained representation 
(MRLR) for robust face recognition. Fig.[l]briefly illustrates 
the MRLR. Inspired by the locality-constrained linear cod¬ 
ing ( [Wang et al. 2010| , the locality is introduced to the dic¬ 
tionary construction for alignment. Specifically, we combine 
a locality adaptor to the 1 2 regularized penalties for x. Be¬ 
cause we also use I 2 norm to constrain e, an efficient analyti¬ 
cal solution can be derived. While updating the face transfor¬ 
mation, we simultaneously update the locality-constrained 
dictionary, as shown in Fig. [2] Our contributions are sum¬ 
marized as follows. 

• The proposed locality-constrained representation avoid 
the exhaustive search in every subject of the training set, 
greatly reducing the computational time and making the 
alignment scalable to large datasets. To the best of our 
knowledge, this is the first time that locality has been in¬ 
troduced to improve the performance of face alignment. 

• MRLR uses the locality adaptor and the /2 -norm to penal¬ 
ize the the representation term and the error term. We de¬ 
rive an analytical solution for the optimization. Moreover, 
we can accelerate the analytical solution by making use 
of the block structure of the deformable dictionary. Thus 
the inverse of a large-size matrix can be further avoided, 
making our model even more scalable and efficient. 

• MRLR simultaneously optimize the transformation and 
update the corresponding locality-constrained dictionary, 
which largely avoids the unsatisfactory local minima. 

• MRLR can take advantage of outside data to better con¬ 
struct the locality-constrained dictionary. Outside data can 
be effectively used to benefit the alignment performance. 

The Proposed Method 
The MRLR Model 

We arrange the given rii training samples from the Ah class 
as columns of a dictionary Di = [d^i, d^, • • • , d^ n .] G 





















































(b) Uncorresponding Sub-Dictionary, Running Time: 1.211s 



(e) Facial Landmarks based Methods, Running Time: 1.596s 
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(d) Corresponding Sub-Dictionary, Running Time: 1.131s 



(f) MRLR, Running Time: 0.636s 


Figure 3. A face alignment example using different constraints, key point based method and MRLR. The red box is the initial estimate and the 
green box denotes the final alignment. After performing face alignment, we use the SRC to perform the face recognition. The stem diagram 
on the right shows the corresponding sparse representation coefficients of the aligned face after performing SRC. We can see only (d) and 
(f) show discriminative representation results. Note that, the corresponding sub-dictionary in (d) means we only use the sub-dictionary that is 
constituted by faces whose label is the same as the testing face. The uncorresponding sub-dictionary in (c) is constituted by faces that do not 
belong to the testing face. 


M rnxn i where dij E M m denotes the j th vectorized train¬ 
ing sample of the it h class. Combining all the dictionary 
from each subjects, we can obtain a global dictionary D = 

[Di,D2,' • • , Dk \ E M mxn where n = Yli=i n i an d k 

is the number of subjects. Suppose that the query face y 
belongs to the it h subject, ideally it can be approximately 
represented by DiX in which x is the representation coeffi¬ 
cients. However, due to the misalignment problem, such lin¬ 
ear subspace representation may be invalid. Therefore we in¬ 
troduce a transformation that models the warping to the orig¬ 
inal face. Instead of observing the y, we observe the warped 
face y w = y°T~ x where o denotes a nonlinear operator and 
r belongs to a finite-dimensional group of transformations 
acting on the image domain (e.g. similarity transformation). 
The linear subspace representation x of the warped face can 
not reveal the true identity. Naively applying recognition al¬ 
gorithm is inappropriate. On the other hand, the potential 
subspace corresponding to y is also unknown, so it is diffi¬ 
cult to align it. Fortunately, by leveraging the high similarity 
of face, we can construct a suboptimal local dictionary for 
alignment, and update the local dictionary according to the 
latest transformation. After several iterations, it eventually 
converges to the accurate transformation. After the true de¬ 
formation t -1 is found, then we can apply its inverse r to 
the testing face and obtain the aligned face y w o r. 

The global dictionary with h/h constraint usually recov¬ 
ers the unsatisfactory transformation (see Fig. [3}, because 
it is prone to local minima under the interference of atoms 
from the other subjects. Inspired by ( [Wang et al. 2010| ), 
we introduce the locality constraints to the dictionary. The 


reason lies in two folds. First, locality-constrained dictio¬ 
nary only uses the most similar atoms to the query, effec¬ 
tively avoiding unsatisfactory local minima caused by dis¬ 
similar atoms, as shown in Fig. [3] Second, using locality- 
constrained dictionary requires no exhaustive search in ev¬ 
ery subject and leads to highly efficient solving algorithm. 
The model of MRLR is formulated as 

min ||c © x\\l + ||e||| s.t. y w o r = Dx + e (1) 

x.e 


where © denotes the element-wise multiplication between 
two vectors, and c E M n is the locality adaptor that attaches 
different penalties to the coefficients x. The locality adaptor 
activates the most correlated atoms for the testing sample, 
while suppressing the uncorrelated ones. Unfortunately, the 
model in Eq. Q is difficult to solve due to the non-linearity. 
A small deformation in the transform r can be linearized as 
Vw o (t + At) = y w 0T + JAr where J = -§-y w 


o r is 


the Jacobian of y o r with respect to r and At is the step 
in r. If an initial r is given, we can repeatedly search for an 
optimal At to update t and J. A final transformation t can 
be obtained to align the warped image. 

The efficiency of the MRLR model lies in two folds. First, 
we enforce the I 2 norm constraints on both c © x and e, 
and derive an analytical solution for MRLR, which is much 
faster than solving li norm minimization. In fact, the perfor¬ 
mance of the 1 2 constraints are similarto the li c onstraints 
in the case without occlusion ( [Zhang, Yang, and Feng 201 1| ). 
Second, we take advantage of the block structure of matrices 
to design a highly efficient algorithm, which obtains exactly 
the same solution in shorter time. The MRLR algorithm is 





































summarized as follows. 


Algorithm 1 The MRLR algorithm for Face Alignment 


Input: The dictionary of training samples D , the warped test¬ 
ing image y w , the initial transformation r (it can be obtained 
by any off-the-shelf face detector, e.g. Viola-Jones detector), a 
constant a. 

Output: The aligned face y 
1 : while not converge or reach maximal iteration do 

2: Compute the locality adaptor: c <— exp(^r^), for all i, 

Ci max(c) — Ci. 


3: 

4: 

5: 

6 : 


3 !• 

while not converge or reach maximal iteration do 
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v \ 3 1 ) \\y w OTj^ || 2 ’ 3 1 


Vu 


At — arg min 

Ar,cc,e 


c0® + e 


^Tj- 

2 

2 




s.t. y w (rj) + JAr = Dx + e 
7: Tj Tj—i + Ar. 

8 : + 

9: end while 

10: T i — Tj, To *<— Tj. 

11: end while 

12: Output the final aligned face y — y w o e. 


Efficient Solving Algorithm 


This section presents a highly efficient solution for the 
MRLR algorithm. By analyzing Algorithm [I] we find the 
optimization in step 6 dominates the overall computational 
time. Although it has an analytical solution, it contains the 
inversion operation of a large-size matrix. We aim to take 
advantage of the block structure of the matrix to decompose 
the inversion. We first reformulate the optimization in Step 
6 as 


At = min \\Cx\\l + ||e||| 

At 

s.t. y w + JAt = Dx + e 


( 2 ) 


where C is a diagonal matrix with the diagonal elements 
being the locality adaptor vector c. We can further substitute 
x 

At 
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into Eq. ([2]), we have 


At = arg min ||C7a;||2+ 

x,Ar 


— arg mm 

x, At 


— arg min || u — Rz\\ 


yw [D, J] 


X 

At 


y w 


D 

-J " 


X 

L 0 


C 

0 


At 


(3) 


where u , R and z denote 


Vw 


D -J 

0 

’ 

C 0 


and 


x 

At 


respectively. It becomes a least square problem 


whose analytical solution is z = (R T R) -1 R T u. As one 
can see, the computational complexity is still high due to the 
large size of R. Actually, the efficiency and the scalability 
can be greatly boosted if we make good use of the block 
structure of the matrix R. 


Using the block matrix inversion, we can rewrite the ana¬ 
lytical solution as 

z = (R T R)~ 1 R T u 
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We denote D 1 D + C 1 C, D 1 J and ./' J as T lt T 2 and T 3 
respectively. In particular, T\ and T 1 _1 can be pre-calculated 
before the inner iteration from Step 4 to Step 9. The other 
variables Z\ and Z 2 can be represented as Zf 1 = (Ti — 
TiT^Tl)- 1 and Z 2 _1 = (T 3 - T^T^ 1 T 2 )~ 1 . Eq. @ 
can be represented as 

Z^(D T y w ) - T~ 1 T 2 Z 2 1 {J T y w ) 


X 


At 



Z; l Tl Tf 1 (D J y w ) - ZT^J 1 y w ) 


(5) 


Note that the purpose of the face alignment is to search a 
deformation step At, so computing x is unnecessary. With¬ 
out computing x , we can save greatly reduce the computa¬ 
tion. Moreover, as mentioned in (Wang et al. 2010), since c 
usually imposes weak constraint on only a few atoms, sup¬ 
pressing most of the atoms. We can simply keep the smallest 
s, (5 « n) entries in c and force other entries to be positive 
infinity. This strategy further accelerates the coding, as we 
present in complexity analysis and experiments (This strat¬ 
egy is termed as MRLR2, while the former proposed one is 
termed as MRLR1). Detailed complexity analysis refers to 
the supplementary material 


Experiments 

We conduct experiments on the face database (Extended 
Yale B ([Georghiades, Belhumeur, and Kriegman 2001J) and 
CAS-PEAL ( |Gao et al. 2008| ) with controlled laboratory 
conditions to comprehensively evaluate MRLR in terms of 
region of attraction, recognition rate, running time and scala¬ 
bility. Then practical face recognition performance are eval¬ 
uated by the Labeled Faces in the Wild (LFW) dataset 
( [Huang et al. 2007] ). The experimental results show that 
MRLR achieves competitive performance with much less 
running time and scales better in large datasets. Moreover, 
MRLR is able to make use of outside data to improve align¬ 
ment, benefiting the subsequent recognition in the scenario 
where only one sample each person is available. 

Implementation details 

In MRLR2 and MRR, the length (the number of atoms) of 
dictionary for alignment is fixed to 20 for fair comparison. 
We basically follow the same setting in ( WagneFetaT20T2]), 
10 classes after first stage are remained in MRR and RASR, 
and one project matrix of 500 rows is used in TSR. The illu¬ 
mination dictionary in ( [Zhuang et al. 2013 1 follows its orig¬ 
inal setup, and the amount of illumination atoms is 30 in 
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Figure 4. The region of attraction. (The amount of translation is given as a fraction of the distance between eyes) (a) Translation in the 
Y-direction only, (b) Translation in the X-direction only, (c) In-plane rotation only, (d) Scale variation only. 


all experiments. The maximum iteration of outer and inner 
loop for MRLR in these methods are consistently set to 3 
and 30.. The li -minimization algorithm uses the Augmented 
Lagrange Multiplier ( |Yang et al. 2010| ). 

The region of attraction 

The region of attraction evaluates the robustness against 
2D deformations. We compare MRLR with TSR (Huang, 
|Huang, and Metaxas 2008|), RASR ([Wagner et al. 2012|), 
MR R (|Yang, Zhang, and Zhang 2012| ) and SIT ( Zhuang et| 

|al. 201 3) on Extended Yale B database, which includes 2414 
images of 38 subjects. We use the uncropped images of 28 
subjects in experiments. 32 training images per category are 
randomly selected, and the rest are used for testing. All the 
training images are resized to 80 x 70. We get access to the 
ground truth of eyes and add perturbation to them. Then we 
calculate the corresponding recognition accuracy under var¬ 
ious initial transformations. One can see that MRLR per¬ 
forms well and stably within a certain range of misalign¬ 
ment, e.g. 20 percent translation in x direction (14-16 pix¬ 
els), 20 degree rotation or 30 percent scale variation. It sig¬ 
nificantly enhances the robustness of practical recognition, 
because the average misalignment of a face detector safely 
falls within 10 percent translation and 8 percent scale vari¬ 
ation. TSR performs relatively poor even in small pertur¬ 
bations. It is mainly because aligning testing image to the 
entire training set is more prone to local minima, resulting 
in inaccurate alignment. Using single sample per class, SIT 
achieves similar performance with TSR due to the limited 
representation ability. Compared to MRLR1, MRLR2 and 
RASR, MRR performs slightly worse on robustness to de¬ 
formation. MRLR1 and MRLR2 perform almost the same 
as RASR, demonstrating that locality-constrained represen¬ 
tation effectively avoids local minima. 


Face recognition in controlled environments 

We conduct the recognition experiments on both Extended 
Yale B and CAS-PEAL datasets. For Extended Yale B, we 
adopt the same settings in the previous section. For CAS- 
PEAL, 20 subjects were chosen, each of them including 
more than 32 images. We randomly selected 20 images per 
subject and resized them to 80 x 70 for training, then test 


on the remaining 12 images. Because SIT (Zhuang et al. 
[2013| trains on single sample per category, we also reduce 
the training set in MRLR to single sample per category for 
fair comparison (termed as MRLR-SS). The initial tq are au¬ 


tomatically given by Viola-Jones detector (Viola and Jones 


2001). Table [T] gives the recognition rates and average run¬ 
ning time 


As we discuss above, unsatisfactory local minima in 
global dictionary leads to poor performance (81.61%) in 
TSR. With less amount of subject (from 32 to 20), local 
minima is alleviated and TSR is able to perform better. 
RASR performs very well in both Extended Yale B dataset 
(92.42%) and CAS-PEAL dataset (89.92%). However, such 
subject by subject search is time-consuming, it averagely 
costs 9.76 and 5.45 seconds on each testing image when the 
amount of subjects are 28 and 20, respectively. On the other 
hand, the recognition rate of MRLR2 is 92.68% and 90.43%, 
slightly better than RASR. Most importantly, it takes only 
0.18 and 0.15 seconds to deal with a testing image, roughly 
4, 55 and 41 times faster than MRR, RASR and TSR re¬ 
spectively. With single sample each subject, SIT achieves 
84.53% and 86.76% recognition rate in two datasets respec¬ 
tively, better than the single sample version of MRLR. Be¬ 
cause the dictionary for alignment consists of illumination 
dictionary (outside samples) and single training sample per 
class, it shares the same scale with RASR, resulting in simi¬ 
lar running time. 

Scalability 

We vary the number of subject from 10 to 100 and resize the 
images from 40 x 35 to 160 x 140, to evaluate the scalabil¬ 
ity of our algorithm. Table [2] and Table [3] show the experi¬ 
mental results. One can observe that TSR, RASR and SIT 
cost too much time, far from being applicable in real-time 
systems. The running time of TSR remains relatively sta¬ 
ble as the dimension increases, but rises linearly with more 
subjects. MRR maintains excellent real-time capability with 
the growth of the subject number. However, its running time 
rises dramatically when the resolution of image increas¬ 
ing. Unlike the abovementioned approaches, MRLR1 and 
MRLR2 are not very sensitive to the dimension or number 
of subjects, preserving competitive performance. MRLR2 


Table 1. The recognition accuracy and running time on Extended 
Yale B and CAS-PEAL datasets. 


Method 

Extended Yale B 

CAS-PEAL 

Recognition 

Rate 

Running 

Time 

Recognition 

Rate 

Running 

Time 

TSR 

81.61 

7.396 

86.96 

4.2695 

RASR 

92.42 

9.7587 

89.92 

5.4466 

MRR 

90.95 

0.7773 

90.00 

0.5684 

SIT 

84.53 

9.9823 

86.76 

6.0329 

MRLR-SS 

77.12 

0.1566 

81.31 

0.1384 

MRLR1 

92.31 

0.6207 

89.76 

0.3307 

MRLR2 

92.53 

0.1783 

90.43 

0.1462 





















































Table 4. Recognition accuracy (%) on LFW dataset. 


costs the least running time and the lowest increasing rate 
as we enlarge the dimension or number of subjects, showing 
the best scalability among state-of-the-art approaches. 


Table 2. Running time (s) under different dimensions (image size). 


Method 

40x35 

64x56 

80x70 

120x105 

160x140 

TSR 

3.645 

3.861 

4.270 

4.672 

5.468 

RASR 

3.499 

4.452 

6.110 

10.324 

17.111 

MRR 

0.133 

0.342 

0.593 

2.259 

5.997 

SIT 

3.564 

4.637 

6.565 

11.035 

19.215 

MRLR1 

0.085 

0.195 

0.331 

0.569 

0.940 

MRLR2 

0.066 

0.118 

0.146 

0.303 

0.505 


Table 3. Running time (s) under different amount of classes. 


Method 

10 

20 

40 

70 

100 

TSR 

2.1533 

3.2825 

5.5280 

8.4034 

11.5327 

RASR 

2.7377 

4.6596 

8.8647 

15.4644 

22.1281 

MRR 

0.5776 

0.5928 

0.6082 

0.6394 

0.6994 

SIT 

2.86 

5.1996 

9.9817 

17.6875 

27.1734 

MRLR1 

0.1977 

0.2819 

0.513 

0.8552 

1.4096 

MRLR2 

0.1318 

0.1373 

0.1559 

0.197 

0.2616 


Face recognition and verification in the wild 

In this section, we test MRLR in practical scenario. LFW 
dataset contains 13,233 images of 5,749 people, while 4,069 
people have only one image. This dataset is very challeng¬ 
ing since it is collected in the uncontrolled wild scenario, 
including blur, various illumination, crossing age, occlusion 
or misalignment. We present two experiments, face recogni¬ 
tion and face verification on LFW database to evaluate the 
performance of MRLR. In recognition testing, we choose 20 
persons with more than 20 images, forming a subset with 
1534 samples. We randomly select 20 samples each sub¬ 
ject as training set, and test on the rest. The experimen¬ 
tal results are shown in Table 3. It is worth noticing that 
there are many testing images including 3D deformation. 
Although aligning a 3D warped image to frontal face is be¬ 
yond the scope of our approach, we do not manually exclude 
these images, for the purpose of evaluating the performance 
of our method in practical scenario. It is clear that MRLR 
performs best among these misalignment-robust recognition 
algorithms, outperforming RASR for 1.55% in recognition 
rate. Furthermore, the single sample version of MRLR beats 
SIT with a significant margin. It is mainly because the illu¬ 
mination dictionary is not informative enough to represent 
such sophisticated intra-class variation in each subject. 

To address the problem of insufficient training images 
when aligning, we propose to use outside data to improve 
alignment. Making fully use of the similarity of face, the 
outside data that belong to neither the training subjects nor 
the testing subjects, also enhances the accuracy of align¬ 
ment. With the outside data, MRLR performs better even 
in the scenario where there is only one sample per subject. 

Face verification represents another task. Given two face 
images, the goal is to decide whether the two people pic¬ 
tured belong to the same individual. Many breakthroughs 


Method 

Accuracy 

Method 

Accuracy 

TSR 

73.63 

MRLR-SS 

72.47 

RASR 

81.43 

MRLR-SS with outside data 

80.42 

MRR 

78.84 

MRLR1 

82.98 

SIT 

55.91 

MRLR2 

81.75 


have been achieve by Convolutional Neural Network (CNN) 
([Krizhevsky, Sutskeve r, and Hinton 2012| Simony an and 
|Zisserman 2014[ ). However, the training data also need to be 
loosely or accurately aligned, so that the verificati on acc u¬ 
rac y can be further boosted. In ( Sun, Wang, and Tang 2014a| 
Sun et al. 2014] [Sun, Wang, and Tang 2014b| ), similar- 


ity transformation is used to align training and testing im¬ 
ages according the facial landmark s. (Taigman et al. 2014} 


Schr off, Ka lenichenko , and Philbin 2015) also state that ac¬ 
curate 3D aligning do help the subsequent face verification. 
In this experiment, we train a simple neural network con¬ 
sisting of 7 convolution layers and 2 fully connected lay¬ 
ers, jointly supervised by softmax loss and contrastive loss. 
The specific deep model description is given in the supple¬ 
mentary material. The deep model is trained on roughly 600 
thousand outside samples (These samples and LFW do not 
share the same individuals.) and test on LFW, following the 
standard unrestricted protocol. The feature of each image are 
taken from the output of the first fully connected layer, and 
their Euclidean distance are calculated for binary classifica¬ 
tion. The pairs whose distance exceeds the threshold are re¬ 
garded as negative. We compare landmarks based (IntraFace 
( Asthana et al. 2014| >) method and MRLR by aligning the 
testing image, and carry out ten-fold cross validation testing 
on 6000 pairs. The results are reported in Table [5] 

Table 5. Verification accuracy (%) on LFW dataset. 


Method 

No. of points 

Distance 

Accuracy (%) 

Intraface 

5 

L2 

98.05±0.64 

Intraface 

5 

PCA+L2 

98.00±0.68 

Intraface 

12 

L2 

98.09±0.60 

Intraface 

12 

PCA+L2 

98.15±0.49 

Intraface 

49 

L2 

98.22±0.61 

Intraface 

49 

PCA+L2 

98.32±0.47 

MRLR2 

N/A(Image Set) 

L2 

98.68±0.51 

MRLR2 

N/A(Image Set) 

PCA+L2 

98.77±0.45 


Concluding Remarks 

In this paper, we propose an efficient misalignment-robust 
locality representation algorithm, MRLR, for face align¬ 
ment. The locality constraint therein avoids the interference 
of the uncorrelated atoms and the exhaustive search in every 
subject, greatly reducing running time while still preserv¬ 
ing accurate alignment. Moreover, motivated by the block 
structure of dictionary, we propose an efficient solving algo¬ 
rithm to speed up the alignment. Besides, MRLR is easily 
extended to one-shot face alignment and can benefit from 
outside data. Computational complexity analysis and exten¬ 
sive experiments show that MRLR considerably reduce the 
running time with even better performance. 
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